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(54) A uri rewriting pseudo proxy server 

(57) A method for real time remapping of access to 
a selected remote domain In an interconnected compu- 
ter system network comprising the steps defining a 
pseudo proxy server and translating in the pseudo proxy 
server a remote record identifier corresponding to the 
remote domain to a remapped record kientifter corre- 
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spending to the local donrain. In a further enhancement 
the method corrprises the additk)na! step of determin- 
ing if a selected record WentHier is a selected remapped 
record kjentifier. 
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Description 

RELD OF THE INVENTION 

This invention reiales to the field of interconnected 
computers, and more particularly to the field of format- 
ted data distributed on interconnected computers. 

BACKGROUND OF THE INVENTION 

Because the Internet evolved from the ARPAnet a 
research experiment that supported the exchange of 
data between government contractors and (often iaca- 
demic) researchers, an on-line culture developed that 
alien to the corporate business world. The Internet was 
not designed to make commerciatlzation easy. 

Domain names direct wh»e e-mail is sent, files are 
found, and conputer resources are located. They are 
used when accessing information on the WWW or con- 
necting to other computers through Tel&iet. Internet 
users enter the domain name, which is automatically 
converted to the Internet Protocol address by the 
Domain Name System (DNS). The DNS is a service 
provided by TCP/IP that translates the symbolic name 
into an IP address by looking the domain name in a 
database. 

The World Wide Web (WWW) is one of the newest 
Internet services. The WWW allows a user to access a 
universe of information which combines text, audio, 
graphics and animation within a hypermedia document. 
Links are contained within a WWW document which 
allows simple and rapid access to related documents. 
The WWW was developed to provide researchers with a 
system that would enable them to quickly access all 
types of information with a common interface, removing 
the necessity to execute a variety of numerous steps to 
access the inforn^tion. During 1991, the WWW was 
released for general usage with access to hypertext and 
UseNet news articles. Interfaces to WAIS, anonymous 
FTP, Telnet and Gopher were added. By the end of 1993 
WWW browsers with easy to use interfaces had been 
developed for many different computer systems. 

With HyperText Markup Language (HTML) based 
pages, such as ttie WWW, the pages of information 
contain pointers to other pages. The pointers, are links 
whk:h are encoded with Uniform Resource Locators 
(URLs). The URL contains a transmission protocol, 
such as HyperText Transfer Protocol (HTTP), a domain 
name of the target computer system, and a page identi- 
fier. 

Accordingly, with thecommerdaiization of the Inter- 
net through advertising, charging for access to informa- 
tion, and other schemes there is a need tor an Internet 
Service Provider (ISP) to record all of the interactions 
that their customers have wittt HTML based content. 

SUMMARY OF THE INVENTION 

In an interconnected computer system network 



there is provided a method for real time remapping of a 
remote domain to a local domain. The method compris- 
ing the steps defining a pseudo proxy server and trans- 
lating in the pseudo proxy server a remote record 

5 identifier corresponding to the renfKrte domain to a rem- 
apped record identifier corresponding to the local 
domain. In a further enhancement the method com- 
prises the additional step of determining H a selected 
record identifier is a selected remapped record identi- 

w fier. 

In an enhancement of the present invention, there 
is provided a method of providing pseudo proxy access 
for tracking and controlling access to remote record 
identifiers. The method comprising the steps of: provid- 

15 ing a first data set having rewritten record identifiers for 
a remote record identifier to a local user; responding to 
a request from the beat user for a selected recad iden- 
tifier; determining if the selected record identifier is a 
rewritten record identifier; determining an actual record 

20 identifier for the rewritten record identifier; and request- 
ing a second data set correspondir^g to the actual 
record identifier from sakJ interconnected computer sys- 
tem network. 

In an another enhancement of the present invention 

25 the first data set comprises a HyperText Markup Lan- 
guage based data set 

In a further enhancement of the present invention 
the remote record identifier comprises a uniform record 
locator. 

30 tn yet a further enhancement the present Invention 
comprises the additional steps of determining that a 
record identifier is remote and rewriting tiie r&wie 
record identrfier. 

Determining that the record identifier is remote in 

35 an enhancement of the present invention comprises the 
step of scanning a domain name of the actual record 
identifier and comparing the domain name to a local 
domain name wherein tiie record identifier is remote if 
the domain name is different tiian the local donrtain 

40 name. 

In yet further enhancements of the present inven- 
tion the step of determining an actual record identifier 
for the rewritten record identifier comprises tooWng up 
the actual record identifier by a predetermined index, by 
45 a hashing table, by addressing a memory location, by 
accessing an inode of a disk tile, or by accessing a disk 
file by a file nama 

BRIEF DESCRIPTION OF THE DRAWINGS 

50 

A more complete understanding of the present 
invention may be obtained from consideration of the fol- 
lowing description in conjunction witi the drawings in 
which: 

55 

FIG. 1 is an overview of interconnected computer 
system networks employing the present invention; 
and 

FIQ. 2 is a flow chart of the procedures o1 the 
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present invention which for tracking local access 
and local control by rewriting URL's. 

DETAILED DESCRIPTION OF VARIOUS ILLUSTRA- 
TIVE EMBODIMENTS 

Although the present invention is particularly well 
suited for use as a URL rewriting pseudo proxy server 
for the WWW» and shall be described with respect to 
this application, the methods and apparatus disclosed 
here can be applied to other schemes employing URLs 
as well as other types of resource location pointers and 
other record identifiers as links within an interconnected 
computer system network. 

The WWW allows a user to access a universe of 
distributed information which combines text, audio, 
graphics and animation within a hypermedia document 
Links are contained within a WVi/W document which 
allows sirrple and rapid access to related documents. 
The WWW provides an access system that enaWes 
users to quickly access all types of information with a 
common interface, removing the necessity to execute a 
variety of numerous steps to access the information. 
The WWW supports interfaces for access to HyperText, 
UseNet news. WAIS. anonymoi^ FTP, Telnet and 
Gopher. 

The WWW has HTML based pages, which contain 
pointers to other pages. The pointers, are HyperText 
links which are encoded with URLs. The URL contains 
a transmission protocol, such as HTTP, a domain name 
of the target computer system, and a page identifier. 

The HyperText links are slnrpty references to other 
documents, made up of two parts The first part is a ref- 
erence to a related item such as a document, picture, 
movie or sound. The item being referenced can be 
within the current document, or it can be located any- 
where on the Internet. The second part is an anchor. 
The anchor can be defined to be a word, group of 
words, a picture, or any area of the display. A reader 
activates an anchor by pointing to it and clicking with a 
mouse, when using a graphkial browser, or by selecting 
it with the cursor (an^ow} keys or tab keys, when using a 
texted based browser. Anchors can be indicated in the 
displayed document by color, graphics, reverse video, 
underline as well as other lorn^ls. 

When an anchor is activated, the browser fetches 
the item referenced by the anchor. This may involve 
reading a document from a local disk drive, or request- 
ing over the Internet that a document be sent to the local 
computer. 

The standard way an item is referenced Is by a 
URL The URL contains a complete desatption of the 
item, which is made up of a protocol and an address. An 
absolute address reference contains the complete 
address including domain name, directory path, and file 
name. A relative address reference assumes that the 
previous domain name and directory path are used 

The URL is not Itmfted to identifying WWW Hyper- 
Text files, but can also access other sets of data in dif- 



ferent protocols including anonymous FTP. Gopher, 
WAIS. UseN« news, and Telenet The URL format is 
typically P-JIA, P is the protocol, such as HTTP (Hyper- 
Text Transfer Protocol), gc^her, FTP (file transfer proto- 
5 col). WAIS (Wide Area Information Server), news 
(UseNet newsgroups), or Telenet A is a valid Internet 
host address or symt>olic location. 

To better understand the present invention, an 
example of an embodiment in which a newspaper con- 

10 sortium composed of tncfividual menders are intercon- 
nected through the Internet shall be used. An individual 
merrber may want to provide access to all of the con- 
sortium member organizations, but would only track 
their local subscribers. 

15 Referring to FIQ. 1 there is shown an overview of 
interconnected computer system networks. Each com- 
puter system network 8 and 10 contains a local compu- 
ter processor unit 12 which is coupled to a local data 
storage unit 14. The local computer processor unit 12 is 

20 selectively coipled to a plurality of local users 1 6. Each 
of the computer processor units 12 are selectively cou- 
pled to other computer processor unrts 12 through the 
internet 18. Local users 16 are also selectively coupled 
directly to the Internet 18. 

25 A local newspaper, which has a computer system 
network 8, such as the Local Paper in Wyoming, may 
allow a focal user 16 to drck into another computer sys- 
tem network 10, such as a Regional Paper in New York 
through the Internet 1 8 to access a data storage unit 1 4 

30 on the other corrputer system network 10. The Local 
Paper computer system network 8 would handle the bill- 
ing for the local user 16 and provide an authentication 
and reconciliation scheme with the Regional Paper 
computer system network 10. permitting both papers to 

35 profit from the venture. 

The current technology utilized over the Internet, 
specifically HTML based pages does not provide a suit- 
able means for achieving the desired scheme. The 
HTML based pages contain hyper links encoded as 

40 URLs, to other pages. If we assume that the Regional 
Paper has a machine (domain) name of regional- 
paper.com far its computer system netw/ork 10 and a 
HTML page about regional news today called regional- 
today.html, a URL pointing to the regional news today at 

45 the Regional Paper would be 

httpy/regional-paper.com/regional -today, html 
which allow access to the appropriate HTTP page 
through the Internet 18. In this case the URL acts as a 
remote record identifier. 

50 If this URL is included in an HTML page served by 
the computer system network 8 of the Local Paper 
HTTP server, the computer system network 8 of the 
Local Paper would have no way of telling if or when the 
local user 16 accessed the regional-today page on the 

55 other computer system network 10. Selecting the URL 
results in the other computer system network 10 of the 
regional-paper.com being accessed and the computer 
system net\work 8 of the local-paper.com is not involved 
in the access. 
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An eloquent way to achieve tracking of access and 
local controt is to make all of the URLs local to the local- 
paper.com machine, thus permitting the local- 
paper.com machine to track and control access. Refer- 
ring to FIG. 2, there is shown a flow chart of the proce- 
dures d the present invention which accomplishes 
tracking of local access and local control by rewriting the 
URL's on the fly as they pass through the local- 
paper.com machine in being served to the local user. In 
step 20, it is determined on the fly if the actual URL Is 
remote and if it is to be rewritten. All remote URL's may 
be selected lor rewriting, or selective groups may be 
selected for rewriting. The selection may be based upon 
the remote domain name which can be compared with 
a list of remote domain names that are to be tracked as 
well as other comparison criteria. Thus, in step 22 the 
selected remote URL 

http://regiona)-paper.com/regional-todayhtml 
would be rewritten as 

ht!p://local-papercom/l27.html 
while the text and graphics on the HTML page would 
remain the same. 

In step 24 the local system sends the HTML page 
containing the rewritten URLs to the local user. In step 
26 the local user clicks on (selects) a URL on the HTML 
page, thus requesting the document 127.html from the 
local-paper.com machine. In step 28 the local HTTP 
server determines if this is a rewritten URL If the URL 
is rewritten, step 30 looks up the actual URL, and in step 
32 sends the HTML page from the regional-paper.com 
machina If the URL was not rewritten, step 30 is 
skipped. It is highly desirable that the rewritten URLs be 
"blind" and not easily decoded, in order that a user 
could not easily defeat the rewriting mechanism. After 
step 30 the proc^fure can repeat again from step 20. 

The local-paper.com machine by serving up the 
HTML page from the regtonal-paper.com is acting as a 
pseudo proxy. Proxy servers are often enployed in envi- 
ronments that contain firewalls. There, the proxy acts on 
behalf of the user through the firewall , directing all 
HTTP access through it is not desirable to supply proxy 
sendee to every user. Many users access the Internet 
through a corporate firewall. It is desirable to leave the 
us^'s envfronment(s) unchanged. The URL rewrite 
scheme does this by being completely transparent to 
the end user. URL's that are not rewritten, which are 
links that we do not want to track, are not rewritten and 
behave as usual. 

TTie proxy server in the rewrite scheme is a pseudo 
proxy or domain specific proxy, in that the server only 
acts as a proxy for the HTML pages that H is hosting and 
the pages that it is pointing to, Typically, proxies have all 
or no requests sent through them. In the present inven- 
tion, with the pseudo proxy, only the requests in its 
domain are served. The conversion of the original 
remote URL to a local/pseudo proxy based URL can be 
implemented efficiently. The rewriting of the URLs is a 
remapping of selected record identifies from one 
domain to another domain (between a local and a 



remote domain). 

First the URL is recognized as a remote URL, which 
is shown as step 20. This can be accomplished by scan- 
ning the domain name part of the URL If the domain 

5 name is remote wrtien compared to a local domain name 
or when compared to a predetermined table of domain 
names that are to be tracked, the remote URL is 
replaced by an opaque local URL whk;h is shown in 
step 22. An opaque URL is one that the user can not 

10 easily generate or reconstruct the remote URL from, as 
this would subvert the process. This can be accom- 
plished by using indices that are private to the HTTP 
server. The generation of the indices can be accom- 
plished from a \oca\ register, an incremented integer, or 

IS memory address from virhere the string is stored in a 
database, the inode of a disk file, or a sinple disk file 
name. 

The conversion of the proxy URL can be done by 
using indices. The number is an index into an array 
20 where the actual remote URL is stored, utilizing a mini- 
mal perfect hash. Hashing is a technique for arranging a 
set of items, in which a hash function is applied to the 
key of each item to determine Its hash value. The hash 
value identifies each item's primary position in a hash 
25 table, and if this position is already occupied, the item Is 
inserted either into an overflow table or in another avail- 
able position in the table. 

The indices also provide a simple way of tracking 
access to the remote URLs, with the level of detail track- 
so ing limited only by the level of detail that is recorded. 
Further, the indices can be utilized to determined if 
access to the remote URL is to be granted or denied 
and may depend upon the particular status or identity of 
a local user. 

35 An alternative scheme is when the name is a 
number of a memory address or a key stored in a data- 
base. Another atternative scheme is to utilize the disk 
inode which requires that the inode be looked up in the 
disk node table. When a disk file name is used, the file 

40 is opened which can comain the remote URL. 

Numerous modifications and atternative embodi- 
ments of the inventbn wilt be apparent to those skilled 
in the art in view of the foregoing description. Accord- 
ingly, this desaiption is to be construed as Illustrative 

45 only and is for the purpose of teaching those skilled in 
the art the best mode of canying out the invention. 
Details of the structure may be varied substantally with- 
out departing from the spirit of the invention and the 
exclusive use of all modifications which come within the 

50 scope of the appended daim is reserved. 

Claims 

1. In an interconnected computer system network a 
55 method for real time ren^pping a remote domain to 
a local donnain, said method corrprising the steps 
of: 

defining a pseudo proxy server; and 
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trarrslating in said psuedo proxy server a 
remote record identifier corresponding to said 
remote domain to a remapped record identifier 
con-esponding to said local domain. 

5 

2. The method as recited in claim 1 conrprrsing the 
additional step of determining if a selected record 
identifier Is a selected remapped record identifier 

3. The method as recited in claim 2 conprising the io 
additional step of requesting a data set correspond- 
ing to a selected remote record identifier corre- 
sponding to said selected remapped record 
identifier. 

15 

4. The method as recited in claim 1 comprising the 
additior^l step of tracking access of a local user to 
said remote domain through said pseudo proxy 
server. 

20 

5. The method as recited in daim 1 conprising the 
additional step of restricting access of a local user 
to said remote domain through said pseudo proxy 
server. 

25 

6. The method as recited in claim 1 wherein said 
remote record identifier comprises a uniform record 
locator. 

7. The method as recited in claim 1 wherein the step 30 
of translating a remote record identifier comprees 
looking up said remote record identifier by a prede- 
termined index. 

fl. The method as recited in claim 7 wherein the step 35 
of translating a rennote record identifier further com- 
prises a hashing tabia 

9. The method as recited in claim 8 wherein said 
hashing table comprises a minimal perfect hash. 40 

10. The method as recited in claim 7 wherein the step 
of translating a remote record bentif ier further com- 
prises addressing a memory location. 

45 

11. The method as recited in claim 7 wherein the step 
of translating a remote record identifier further com- 
prises accessing an inode of a disk file. 

12. The method as recited in claim 1 wherein the step so 
of translating a remote record identifier comprises 
accessing a disk f 3e by a file name. 

13. The method as recited In claim 3 wherein said data 

set comprises a hypertext transfer protocol request, ss 

14. In an interconnected computer system network a 
method of traddng and controlling access to remote 
record identifiers, said method comprising the steps 



of: 

providing a first data set having a rewritten 
recwd identifier for a remote record identifier to 
a local user; 

responding to a request from said local user for 

a selected record identifter; 

determining if said selected record identifier is 

a rewritten record identifier; 

determining an actual record identifier for said 

rewritten record identifier; and 

requesting a second data set con'esponding to 

said actual record identifier from said int^con- 

nected computer system network. 

15. The method as recited in claim 14 wherein sakJ first 
data set conprises a hypertext markip iangu^e 
based data set. 

16. The method as recited in claim 14 wherein said 
remote record klentifier conrprises a uniform record 
tocator 

17. The method as recited in claim 14 comprising the 
additional steps of determining that a record identi- 
fier is remote and rewriting said remote record iden- 
tifier. 

18. The method as recited in claim 17 wherein the 
steps of determining that a record identifier is 
remote comprises scanning a domain name of said 
actual record identifier and comparing said domain 
name to a local domain name wherein said record 
identifier is remote if said domain name is different 
than said local domain name. 

19. The method as recited in daim 14 wherein the step 
of determining an actual record identifter for said 
rewritten record identifier comprises looking up said 
actual record identifier by a predetermined index. 

20. The method as recited in daim 19 wherein the step 
of determining an actual record identifier further 
comprises a hashing table. 

21. The method as redted in daim 20 wherein said 
hashing table corrprises a minimal perfect hash. 

22. The method as redted in daim 19 wherein the step 
of determining an actual record identifier further 
comprises addressing a memory location. 

23. The method as recited in daim 19 wherein the step 
of determining an actual record klentifier further 
comprises accessing an inode of a disk file. 

24. The method as recited In daim 14 wherein the step 
of determining an adual record identifier for said 
rewritten record identifier conrprises accessing a 
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dfekfilebyafile name. 

25. The method as reated in claim 24 wherein said disk 
file contains a domain name of said actual record 
identifier. 

26. The method as recited in claim 24 wherein saiddisk 
file contains said actual record identifier. 

27. The method as recited in claim 14 wherein said 
request from said local user comprises a hypertext 
transfer protocol request. 

28. In an interconnected computer system network a 
method of providing pseudo proxy access for track- 
ing and controlling access to remote uniform record 
locators, said method comprising the steps of: 

providing a hypertext markup language based 
page having a rewritten unibrm record locator 
for a remote uniform record locator to a local 
user; 

responding to a request from said local user for 
a selected uniform record locator; 
detemnining if said selected uniform record 
locator is a rewritten uniform record locator; 
detemiining an actual uniform record locator 
for said rewritten uniform record locator; and 
requesting a second data set con-esponding to 
said actual uniform record locator from said 
interconnected computer system networt^. 

29. The method as recited in claim 28 comprising the 
additional steps of determining that a uniform 
record locator is remote by comparing a domain 
name of said uniform record locator to a local 
domain name, wherein said uniform record locator 
is remote If said domain name is different than said 
local domain name and rewriting said remote uni- 
form record locator. 

30. The method as recited in claim 28 virtieran the step 
of determining an actual uniform record locator tor 
said rewritten uniform record locator comprises 
looking 14) said actual uniform record locator by a 
predetermined index. 

31. The method as recited in claim 30 v^ieran the step 
of determining an actual unifomn record locator fur- 
ther comprises a hashing table. 

32. The method as recited in claim 31 wherein said 
hashing table comprises a minimal perfect hash. 

33. The method as recited in claim 28 wherein the step 
of determining an actual uniform record locator fur- 
ther comprises addressing a memory location. 

34. The method as reoted in claim 28 wherein the step 



of determining an actual uniform record locator fur- 
ther comprises accessing an inode of a disk file. 

35. The method as recited in claim 23 wherein the step 
5 of determining an actual uniform record locator for 

said rewritten uniform record locator cornprises 
accessing a disk file by a file name. 

36. The method as recited in daim 28 comprising the 
?o. additional steps of determining that a uniform 

record locator is remote by comparing a domain 
name of said uniform record locator to compared to 
a predetermined table of domain names and rewrit- 
ing said remote uniform record locator. 

15 
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